Final Report : Power Price Prediction#

Contributors#

  • Arjun Radhakrishnan

  • Sneha Sunil

  • Gaoxiang Wang

  • Mehdi Naji

Executive Summary#

Numerous Alberta-based organizations rely heavily on energy to fuel their business operations and are in search of an effective forecasting tool that offers accurate and interpretable predictions. Our business solution precisely addresses this need by offering an interpretable and explainable data science product deployed on the cloud, specifically designed for power price prediction in the Alberta Energy Market. Our solution equips organizations with the ability to make knowledgeable decisions about their energy purchases by forecasting hourly energy prices for the next 12 hours, supplemented with confidence intervals. It is noteworthy that our forecasting system has demonstrated a 23.21% improvement in prediction accuracy compared to the current system, which only forecasts for the next 6 hours. Our solution also addresses the lack of interpretability and explainability of predictions, which is showcased in an intuitive Tableau dashboard.

Introduction#

Over the past few decades, the electricity market in Alberta province has undergone a significant transformation, shifting from regulated to competitive and deregulated environments where prices are determined by the interplay of supply and demand dynamics in a competitive marketplace. Prices are influenced by various participants in the market, including power generators, transmission companies, and retailers. Consequently, the deregulated nature of the market plays a crucial role in the volatility of power prices. An interactive plot below showcases the volatility of the price which serves as a reminder of the inherent complexity of the data science problem that needs to be addressed.

Energy-intensive industries in Alberta heavily depend on accurate price predictions to plan for future energy costs and optimize their operations. In light of the growing uncertainty caused by escalating price volatility, the significance of accurate electricity price predictions cannot be emphasized enough. These predictions play a pivotal role in enabling stakeholders, including energy buyers, to navigate the market successfully by efficiently strategizing their operations. Currently, these organizations depend on energy forecasting tool published by AESO - Alberta Electric System Operator to determine their energy costs in advance for their business operations. AESO is an independent system operator in Alberta responsible for operating the electrical grid, facilitating the competitive electricity market, and managing the entire power distribution system for the province. The power pool price for every hour is finalized by AESO based on supply and demand. However, the current energy forecasts published by AESO only provide a short-term coverage of 6 hours into the future, which is volatile and lacks interpretation or model visibility.

To reduce their expenses, companies could plan and potentially explore alternative energy options if they have access to accurate forecasts which covers a longer window and is also interpretable and explainable. To address the challenges faced by organizations in Alberta, our product offers a comprehensive solution that empowers them to effectively analyze costs, optimize energy purchases, and ultimately maximize their profits. The scientific objectives of our product are

  • Forecasting energy prices for the next 12 hours

  • Interpretability and explainability of predictions

  • Scalable and real-time prediction pipeline

  • Reactive Tableau Dashboard for visualizing the real-time results

Data Science Techniques#

Our project utilized two primary data sources:

Open-source Tableau Data: We had access to historical hourly time series data published by AESO in Tableau.

Open-source API Data: AESO provides a public API service that grants us access to real-time and historical data for selective features.

Our initial dataset consisted of approximately 110 features and ~ 72,000 rows, covering various aspects such as power generation/supply by fuel type, energy load distribution across regions, import and export data, and system marginal price data. The primary target we focused on forecasting was the power pool price (CAD). This price represents the balanced average power price per hour for the Alberta province as determined by AESO. It is capped between 0 and 1000 to ensure that the Alberta electricity market is stable and fair. Our feature set predominantly comprises numerical data, with one exception of an ordinal feature that was engineered by us.

Examining the plots reveals consistent trends in energy prices. On weekdays, prices are higher during peak hours (8 am to 8 pm) due to increased demand from business operations. Weekdays also show lower prices during off-working hours. However, weekends have a different pattern, with higher prices in the evenings. These observations are supported by the autocorrelation function plots, which clearly demonstrate daily seasonality in energy prices. To capture the combined effects of day of the week and peak/off-peak hours, an ordinal feature called ‘weekly_profile’ was created to represent time-related variables and energy pricing dynamics.

In the below plot, we can clearly see a daily seasonality in this plot (for every 24 hours)
Note : ACF plot depicts the correlation between the price and its lagged values

Data preprocessing#

For the modeling purpose, we partitioned our data into two subsets: the training set from 1st January 2021 to 25th May 2023 and the testing set from May 26th 2023 to May 31st 2023. Given the absence of real-time data for all features through the API, we leveraged historical data to simulate a real-time prediction system. When real-time data becomes accessible for the clients, they can seamlessly swap the data sources, thereby enabling real-time data flow into the model which will make real-time predictions.

In our pursuit to accurately predict future prices based on historical values of influential factors, we transformed the time-series data of all the features into a tabular format by creating lagged versions of both the target variable and the relevant features. Each lagged version corresponds to a specific hour, with the price for that hour being used as the target variable.

Feature Selection and Engineering#

In the process of feature selection out of 110 features, our primary strategy involved examining the correlations between various features and the price. Considering the importance of interpretability in our model, we also conducted comprehensive market research and engineered several key features showing a significant correlation with the price. One such feature is the gas reserve margin, a buffer/reserve of gas, readily available to generate electricity to meet sudden load demands and peak hours. As evidenced in our data visualizations, a dwindling gas reserve tends to correspond with an increase in price. In the case of gas supply mix which is the proportion of energy generation using gas by the total energy generation, when the supply is mostly using gas, the price increases as gas is costly compared to the rest of the sources.

For more information about the key engineered features, please check out the Key Engineered features section on the left.

We further refined our feature set by leveraging the coefficients from training an Elastic Net CV model and the feature importances deduced from training a Random Forest Regressor model.

Pursuing a second strategy, we investigated the correlation between lagged features and future prices projected for periods ranging from 1 to 12 hours. We identified features exhibiting correlations of absolute value greater than 0.3 and incorporated them into our feature set. Interestingly, both strategies resulted in almost identical sets of features.

Modelling Strategy#

Since the price is volatile, tackling our first scientific objective which is forecasting energy for the next 12 hours seemed complex. Hence, we needed models that are apt for time series forecasting, which can pick up temporal patterns. As a baseline model for our problem, we chose the SARIMA (Seasonal Autoregressive Integrated Moving Average) model for univariate forecasting as it is a popular classical time series forecasting technique that incorporates both autoregressive and moving average components along with seasonality. It also supports probabilistic forecasting and the generation of confidence intervals.

However, as we encountered limitations in accurately capturing the complex dynamics of energy price fluctuations solely with the SARIMA model, we decided to transition to more sophisticated machine-learning models to improve the accuracy of our forecasts by incorporating the key engineered features.

For our forecasting horizon of 12 steps, we implemented the direct strategy, which involved training 12 individual models using the same historical data until the cut-off point, each responsible for predicting the price for a specific timestep within the horizon (1, 2, 3… 12). The cut-off hour refers to the specific point in time up to which you use the data to train your model. In this approach, Model 1 would consistently predict the power price for the next timestep, while Model 12 would forecast the price for 12 timesteps into the future. By adopting this approach, we avoided the accumulation of errors and controlled error propagation that can occur in the recursive strategy. Additionally, when a new data point became available, we updated the cut-off time by 1 hour and retrained all 12 models using the most recent data. This allowed us to continuously incorporate real-time data and ensure reliable future predictions. To build our base pipeline, we leveraged the sktime package, a widely-used package that supports time series forecasting.

Cross Validation Split#

For the purpose of cross validation, it was crucial to keep the validation times to a minimum while at the same time covering a diverse range of temporal patterns and potential scenarios. To capture the seasonality in price variation over time, we initially trained our models using data from January 1st, 2022, to December 31st, 2022. For cross-validation, we utilized the entire month of January 2023, creating 63 folds to validate our models. Since our data is time series data, preserving the temporal order of the data in each fold was crucial. The first fold consisted of an initial training window of one year and a validation set spanning 12 hours of data. We made predictions for these 12 hours and compared them with the actual prices in the validation set to calculate the errors. We then expanded the training window by including the 12 hours of data from the validation set and proceeded to predict the next 12 hours. This process was repeated for a total of 63 folds.

Experimented models#

Our initial choice for modeling was Elastic Net CV, a linear regression model that effectively handles multicollinearity and supports feature selection. Given our emphasis on interpretability, a linear model was a suitable option as it offers a straightforward interpretation of coefficients and relationships between variables. In addition to Elastic Net CV, we also considered XGBoost and Light GBM which are powerful gradient-boosting algorithms that excel in efficiency, accuracy, handling large datasets, and managing multicollinearity. LightGBM specifically satisfied all our requirements, including scalability and efficient model fitting. It particularly excelled when loaded on a GPU, enabling faster training times for large datasets. Notably, LightGBM supports warm initialization, enabling rapid model refitting on new data, which was essential for our real-time forecasting scenario.

Evaluation Metric#

To assess model performance, we selected the Root Mean Square Error (RMSE) as our evaluation metric because it aligns with our project’s focus on interpretability and is easily understandable as the error value is in the same scale as the actual values measured in CAD. We are using two variations of RMSE as evaluation metrics in our project.

  • The first metric, referred to as the stepwise error, involves generating 12-step predictions into the future for each hour. For each prediction made at a specific step into the future (referred to as “n” step predictions), we calculate the RMSE by comparing the predicted values with the actual values recorded at those corresponding hours. For example, we compare our 1-step predictions with the actual values to obtain the step 1 error.

  • The second metric we employ is the average error per prediction. With this metric, we calculate the RMSE for each prediction instance that comprises 12 forecasts. Subsequently, we compute the average of these RMSE values.

Cross Validation Results#

Model Avg RMSE(CAD) RMSE SD(CAD)
0 LightGBM 82.962592 79.157233
1 ARIMA 88.812002 67.579395
2 Elastic Net 89.302110 66.732290
3 XGBoost 94.000088 70.299748

The table presents the average and standard deviation of errors per prediction calculated across 63 folds during cross-validation for the specified models.

As seen from the above results, all the models demonstrated good performance according to the RMSE metric, but Light GBM excelled, considering our specific needs. Its computational efficiency, rapid fit times, and warm state capability - which retains and updates previously learned data - made it ideal for quick model updates when AESO published new data. Additionally, Light GBM captured spikes in prices, whereas the other models struggled to capture such sudden changes, resulting in flatter predictions. Light GBM also supported quantile regression, enabling us to generate confidence intervals for our predictions. All in all, its superior performance and adaptability made Light GBM the best out of the experimented models for our forecasting pipeline.

Using sktime, we encountered a challenge where refitting the data to all 12 Light GBM models became time-consuming. To address this problem, we implemented custom code that significantly reduced the refit times from approximately 4.5 minutes to less than 0.5 seconds. Despite a slight increase in our RMSE, this optimization was a significant achievement for us, allowing us to overcome the computational burden and enhance the efficiency of our pipeline.

After cross-validation, we performed hyperparameter tuning on L1 and L2 regularization using Light GBM. We increased the number of boosting rounds and reduced the learning rate, following recommended practices. To prevent overfitting, we set the maximum depth to 15 instead of allowing the tree to grow to the bottommost level.

Interpretibility of predictions#

To achieve our second scientific objective of obtaining feature importance and interpreting the predictions of our model, we relied on the SHAP (SHapley Additive exPlanations) framework. SHAP enables us to explain and interpret the individual predictions made by our model. For each prediction the model makes, we can easily obtain the SHAP values of each of the features for this prediction, which will quantify the impact and contribution of each feature for the prediction. To quantify the uncertainty of our predictions, we utilized quantile regression to obtain the 95% confidence intervals.

For more information, please check the Appendix section.

Test Results#

Our modeling pipeline outperformed AESO’s model across all timesteps as shown in the table. Our prediction pipeline demonstrates superiority not just in short-term 1-step and 2-step forecasts but continues to maintain lower RMSE values right up to 12-step forecasts. Contrary to the increasing RMSE as we forecast for farther timesteps, it is important to highlight that our model consistently demonstrates a comparatively lower range of RMSE compared to AESO, even during higher step predictions. This implies that our model’s accuracy doesn’t degrade significantly with the extension of the forecast horizon.

We can see below the stepwise errors for the time range of May 26th - May 31st 2023. Our model predictions are better than AESO predictions by ~23.21% even while predicting twice the number of steps into the future. ​

Comparison of our Step wise RMSE with AESO#

The training data spans from January 1, 2021, to May 25, 2023, while the test set covers the period from May 26 to May 31.

Average error per prediction from 26th May 2023 to 31st May 2023 is 101.16 CAD.
Average step wise RMSE from 26th May 2023 to 31st May 2023 is 118.53 CAD.
1 Step RMSE 2 Step RMSE 3 Step RMSE 4 Step RMSE 5 Step RMSE 6 Step RMSE 7 Step RMSE 8 Step RMSE 9 Step RMSE 10 Step RMSE 11 Step RMSE 12 Step RMSE Avg Step RMSE
0 104.79 116.36 143.97 152.41 191.53 217.49 NaN NaN NaN NaN NaN NaN 154.42
1 102.52 114.58 121.74 122.43 126.86 135.85 125.12 122.44 114.28 111.14 114.86 110.59 118.53

Prediction Animation#

The animation above shows the predictions made by our model for the test set. The predictions are made for every hour from May 26th to May 31st.

Data Product#

Our data products offer a scalable and real-time prediction pipeline that accurately forecasts energy prices for the next 12 hours. These predictions come with a high level of explainability, allowing our partners to make well-informed decisions regarding their energy purchases. To facilitate this, we have developed a reactive tableau dashboard that empowers users to easily access and visualize real-time results.

The dashboard is structured into three primary sections to enhance user experience. At the top, the first section features a comprehensive 24-hour energy price timeline chart which provides a holistic view of energy prices by displaying actual prices for the previous 12 hours and forecasted prices for the upcoming 12 hours. Additionally, each hourly prediction is accompanied by a 95% confidence interval. Moving to the lower left, the second section presents a dynamic bar chart that updates as users hover over specific predictions in the first chart. The chart highlights the top four features and offers a deeper understanding of the key factors driving each prediction. Finally, the third section showcases a time series plot illustrating four significant global factors that exhibit a correlation with energy prices.

For more information on the architecture, please check the Appendix section.

tableau_dashboard

Conclusion and Recommendations#

In Alberta’s liberalized market, predicting power prices is an intricate task, as it relies on a balance between supply and demand, intertwined with multiple influencing factors. The high volatility mixed up with seasonality patterns, coupled with the lack of clear patterns, further compounds the challenge of accurate forecasting as our prediction window extends to 12 hours. Another crucial aspect of our project has been the constant tug-of-war between model accuracy and interpretability. While simpler models enhance interpretability, they lack the complexity to effectively address the problem. Our current LightGBM model strikes a middle ground, but the ideal balance remains an exploratory challenge with numerous potential models yet to be examined.

During the development phase, we encountered challenges while utilizing the sktime package. Moving forward, it could be advantageous to develop certain components internally, which would allow us to optimize our pipelines and have greater control over them. Initially, we considered two model approaches: one using known features for the next 12 timesteps and another for unknown features. Due to the lack of reliable future data, we chose the latter model. However, when we have access to real-time data, we can combine both models to create a more effective ensemble for better results. The 20% improvement mentioned in the test results is based on a 5-day test data. Expanding the testing period beyond 5 days would provide a more comprehensive assessment. Additionally, we can explore advanced machine learning models like transformers to find the right balance between accuracy and interpretability. Despite the numerous challenges encountered, we were successful in achieving our scientific objectives.

Appendix#

Interpretibility of Predictions#

The local interpretability of predictions obtained using SHAP values helps us understand the factors driving individual predictions. To establish a reference point for comparison, we used a base value, which represents the average value of predictions made by the model. This base value acts as a benchmark, allowing us to evaluate the impact of each feature in relation to the model’s expected output. The SHAP values can be positive or negative, indicating the direction and magnitude of influence. Positive SHAP values indicate that a feature positively influenced the prediction, pushing the prediction value higher than the base score. Conversely, negative SHAP values indicate a negative influence on the prediction. In our dashboard or visualizations, we explicitly showcased the percentage increase or decrease contributed by each feature compared to the base score. This provides a clear understanding of how much each feature contributes to the current prediction, allowing for intuitive interpretation and explanation of the model’s behavior. For obtaining the confidence interval, we trained two separate LightGBM models with the objective as quantile regression. The first LightGBM model was trained to predict the upper limit of the confidence interval. To achieve this, we configured the desired quantile to be 0.975, representing the 97.5th percentile of the distribution. The second LightGBM model was trained to predict the lower limit of the interval, with the desired quantile set to 0.025, representing the 2.5th percentile. After training the models, we utilized the predict() method to obtain the predicted quantiles for each prediction. These predicted quantiles represented the upper and lower limits of the 95% confidence interval. This provided a measure of uncertainty and allowed us to communicate the level of confidence associated with our forecasts which facilitated better decision-making and risk assessment.

Deployment Architecture#

In order to make our machine learning pipeline scalable, we have deployed our product in Databricks. Our architecture involves two main jobs running within Databricks. We also have two storage units - one for storing the predictions for the dashboard and the second one for archiving all the predictions made by the model. The first job serves as our initial training pipeline, responsible for training the model using the training dataset and subsequently saving the trained model. We also store the model predictions, upper and lower limits, as well as the prediction explanations. These details are stored in both the real-time predictions table and the archive table.

In addition to the initial training pipeline, we have implemented an update job that runs on an hourly basis. This job retrieves the new actual power price for the past hour, typically published by the AESO API. However, due to current limitations in data availability, we simulate this process using historical data. The update job leverages these new values to refit the data in all 12 models, thereby generating the next set of predictions. These updated predictions are then seamlessly integrated with Tableau, allowing us to promptly update our visualizations and plots.

Databricks is highly scalable and capable of handling data-intensive processes, making it an ideal computing engine. Conversely, Tableau stands out for its user-friendly maintenance and extensive built-in features, making it an efficient tool for visualizing predictions. Tableau seamlessly integrates with various data sources, enabling easy access and analysis within a unified interface. However, there are certain challenges to consider. Currently, our Databricks cluster computation speed is slower than personal laptops due to a lack of GPU support in the current configuration. Additionally, certain features available in Python’s Plotly are either absent or more challenging to implement in Tableau, given its non-code-based approach. The use of the Dash library in combination with Plotly could have provided more advanced and customizable visualizations, which unfortunately we were unable to explore due to time constraints.

architecture

Transformations explored for the power price#

To address the inherent challenges of our target variable like extreme volatility, we attempted several transformation techniques:

Log Transformation: Given the skewed nature of the price data, a logarithmic transformation was used with an intent to normalize its distribution. This transformation could potentially make patterns in the data more interpretable and better meet the assumptions of downstream modeling techniques.

Standard Scaler: The standard scaler transformation was applied to standardize the price values. This technique could help minimize the effect of outliers and varying scales across different features by scaling the values to have a mean of 0 and a standard deviation of 1.

Discretization: The discretization transformation was used as it often complements tree-based models such as LightGBM, improving their performance by turning continuous features into several bins, each representing a range of price values.

Despite these efforts, none of these transformations led to significant improvements in our model’s predictive performance. Consequently, we decided to proceed with the original scale of the price variable. This approach might also simplify the interpretation and communication of the model results, as the predictions will be in the same scale as the original data.